AITopics | model-parallel training

Collaborating Authors

model-parallel training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Ouroboros: On Accelerating Training of Transformer-Based Language Models

Neural Information Processing SystemsJan-22-2025, 01:42:15 GMT

The paper introduces a new method for model-parallel training, where layers of a model are distributed across multiple accelerators. The method avoids locking in the backward pass by using stale gradients during back-propagation. I'm not aware of any prior work that took such an approach. Furthermore, the authors provide theoretical claims and empirical results to demonstrate that their method has convergence properties similar to conventional SGD, despite using stale gradients. The lack of effective model-parallel training is a major roadblock for scaling up model sizes, and the proposed approach promises to overcome this issue.

accelerating training, model-parallel training, transformer-based language model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

A guide to the field of Deep Learning

#artificialintelligenceJun-24-2021, 11:25:55 GMT

Since the list has gotten rather long, I have included an excerpt above; the full list is at the bottom of this post. At the entry level, the datasets used are small. Often, they easily fit into the main memory. If they don't already come pre-processed then it's only a few lines of code to apply such operations. Mainly you'll do so for the major domains Audio, Image, Time-series, and Text. Before diving into the large field of Deep Learning it's a good choice to study the basic techniques.

dataset, deep learning, neural network, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow

Awan, Ammar Ahmad, Jain, Arpan, Anthony, Quentin, Subramoni, Hari, Panda, Dhabaleswar K.

arXiv.org Artificial IntelligenceNov-12-2019

The enormous amount of data and computation required to train DNNs have led to the rise of various parallelization strategies. Broadly, there are two strategies: 1) Data-Parallelism -- replicating the DNN on multiple processes and training on different training samples, and 2) Model-Parallelism -- dividing elements of the DNN itself into partitions across different processes. While data-parallelism has been extensively studied and developed, model-parallelism has received less attention as it is non-trivial to split the model across processes. In this paper, we propose HyPar-Flow: a framework for scalable and user-transparent parallel training of very large DNNs (up to 5,000 layers). We exploit TensorFlow's Eager Execution features and Keras APIs for model definition and distribution. HyPar-Flow exposes a simple API to offer data, model, and hybrid (model + data) parallel training for models defined using the Keras API. Under the hood, we introduce MPI communication primitives like send and recv on layer boundaries for data exchange between model-partitions and allreduce for gradient exchange across model-replicas. Our proposed designs in HyPar-Flow offer up to 3.1x speedup over sequential training for ResNet-110 and up to 1.6x speedup over Horovod-based data-parallel training for ResNet-1001; a model that has 1,001 layers and 30 million parameters. We provide an in-depth performance characterization of the HyPar-Flow framework on multiple HPC systems with diverse CPU architectures including Intel Xeon(s) and AMD EPYC. HyPar-Flow provides 110x speed up on 128 nodes of the Stampede2 cluster at TACC for hybrid-parallel training of ResNet-1001.

gradient, hypar-flow, partition, (17 more...)

arXiv.org Artificial Intelligence

1911.05146

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Texas (0.04)

Genre:

Research Report (0.65)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Local Critic Training for Model-Parallel Learning of Deep Neural Networks

Lee, Hojung, Lee, Jong-seok

arXiv.org Machine LearningMay-3-2018

This paper proposes a novel approach to train deep neural networks in a parallelized manner by unlocking the layer-wise dependency of backpropagation training. The approach employs additional modules called local critic networks besides the main network model to be trained, which estimate the output of the main network in order to obtain error gradients without complete feedforward and backward propagation processes. We propose a cascaded learning strategy for these local networks so that parallelized training of different layer groups is possible. Experimental results show the effectiveness of the proposed approach and suggest guidelines for determining appropriate algorithm parameters. In addition, we demonstrate that the approach can be also used for structural optimization of neural networks, computationally efficient progressive inference, and ensemble classification for performance improvement.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

1805.01128

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback